ReliAble dependency arc recognition
نویسندگان
چکیده
We propose a novel natural language processing task, ReliAble dependency arc recognition (RADAR), which helps high-level applications better utilize the dependency parse trees. We model RADAR as a binary classification problem with imbalanced data, which classifies each dependency parsing arc as correct or incorrect. A logistic regression classifier with appropriate features is trained to recognize reliable dependency arcs (correct with high precision). Experimental results show that the classification method can outperform a probabilistic baseline method, which is calculated by the original graph-based dependency parser. As a fundamental task of natural language processing, dependency parsing has become increasingly popular in recent years. It aims to find a dependency parse tree among words for a sentence. Fig. 1 shows an example of dependency parse tree for a sentence, where sbj is a subject, obj is an object, etc. (Johansson & Nugues, 2007). Dependency parsing are widely used: in biomedical text However, when we migrate dependency parsing systems from laboratory demonstrations to high-level applications, even the best parser available today still encounter some serious difficulties. First of all, parsing performance usually dramatically degrades in real fields because of domain migration. Secondly, since every parser inevitably will make some mistakes during decoding, outputs from any dependency parser are always fraught with a variety of errors. Thus, in some high-level applications which expect to use correct parsing results, it is extremely important to be able to predict the reliability of the auto-parsed results. If these applications just use correct parsing results and ignore incorrect results, their performances may be improved further. For instance, if an entity relation extraction (a kind of information extraction) system, which depends on parsing results heavily (Zhang, Zhang, Su, & Zhou, 2006), only extracts relations from correct parsing sentences, then the system can extract more accurate relations and import less wrong relations through incorrect parsing results. Although some implied relations in those incorrect parsing sentences are missed, these missing relations may be extracted from other sentences that can be parsed correctly while zooming in the data to the whole Web. Most large-margin based training algorithm for dependency parsing output models that predict a single parse tree of the input sentence, with no additional confidence information about the correctness of it. Therefore, an interesting problem is how to judge a parsing result as correct or not. However, it is difficult to obtain a parse tree in which all sub-structures are parsed correctly. …
منابع مشابه
Maximal Arc-Length Matching at Multiple Thresholds (MALT) for Variational Shape Recognition
Widely varying shapes are diÆcult to recognize especially when present as part of an occluded shape. Handling such situations requires the consolidation of data at di erent levels of con dence, which is the philosophy motivating the Maximal Arc-Length matching at multi-Thresholds algorithm (MALT). The algorithm matches segments from two contours and rates a pair of matches more highly if the ar...
متن کاملA Dynamic Confusion Score for Dependency Arc Labels
In this paper we propose an approach to dynamically compute a confusion score for dependency arc labels, in typed dependency parsing framework. This score accompanies the parsed output and aims to administer an informed account of parse correctness, detailed down to each edge of the parse. The methodology explores the confusion encountered by the oracle of a data driven parser, in predicting an...
متن کاملCMU: Arc-Factored, Discriminative Semantic Dependency Parsing
We present an arc-factored statistical model for semantic dependency parsing, as defined by the SemEval 2014 Shared Task 8 on Broad-Coverage Semantic Dependency Parsing. Our entry in the open track placed second in the competition.
متن کاملConstrained Arc-Eager Dependency Parsing
Arc-eager dependency parsers process sentences in a single left-to-right pass over the input and have linear time complexity with greedy decoding or beam search. We show how such parsers can be constrained to respect two different types of conditions on the output dependency graph: span constraints, which require certain spans to correspond to subtrees of the graph, and arc constraints, which r...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 41 شماره
صفحات -
تاریخ انتشار 2014